The development of laying hen locomotion in 3D space is affected by early environmental complexity and genetic strain

Adult laying hens are increasingly housed in spatially complex systems, e.g., non-cage aviaries, where locomotion between elevated structures can be challenging for these gallinaceous birds. This study assessed the effect of early environmental complexity on spatial skills in two genetic strains. Brown (B) and white (W) feathered birds were raised in: Conventional cages with minimal complexity (Conv) or rearing aviaries with low (Low), intermediate (Mid), or high complexity (High). Birds from each housing treatment were challenged at three different time points in three different, age-appropriate vertical spatial tasks. Whites performed better than brown birds in all tests regardless of rearing environment. In chicks, test performance was predominantly explained by variation between replicates and differences in motivation for test participation. Treatment effects were seen in pubertal birds (pullets), with pullets from aviaries performing better than those from Conv. White High pullets performed better than white Mid or Low, an effect that was not found in browns. Pullets preferred to use a ramp to move downwards, but only when ramps had previously been experienced and when the ramp was not too steep. Overall, early environmental complexity affected spatial skills of laying hen pullets with stronger effects in white than brown feathered birds.


Hurdle test
The hurdle test was conducted in a testing arena in a separate room. The testing arena consisted of two sections, divided by a hurdle (Figure 2). The bigger section held a group of three chicks as a social incentive for attracting the focal bird to cross the hurdle. Feed was strewn on the floor to keep the incentive chicks engaged in front of the hurdle, and a mesh barrier prevented them from jumping over the hurdle into the testing section. The testing section was smaller and offered no feed. The hurdles were made of white plastic grid that allowed chicks to see through to the other side but prevented them from passing through.
Each chick was tested in one of five possible set-ups only, with 294 chicks tested per set-up. See Supplementary Table 2 for a detailed description of the numbers of chicks tested by housing system. In the two set-ups that included a ramp (difficulties 2 and 4), ramps were at a 30-or 42degree angle for difficulties two and four respectively. Chicks of this age have been seen to climb ramps of up to 40-degrees incline without much difficulty with increasing struggle at steeper angles 4 . Sample size for each difficulty level consisted of 12 chicks per aviary and strain combination and 6 of both strains of conventional cage chicks for flocks 2-4. In flock 1, only half as many chicks were tested in each group. This resulted in a total of 210 browns and 210 whites tested from each of the aviary styles and 105 of each strain from the conventional cages. Testing occurred over a period of four days, with chicks from each housing and strain combination tested in a balanced order on each day. Test difficulty was randomized in flock 1 but switched to a standardized increase over the testing period for the three following flocks to reduce the time spent on remodelling the arena in favour of testing more birds. Three conspecifics were used as a cue to motivate the chicks to cross the hurdle for social re-instatement. To start a test, the focal chick was placed in the testing section (centre and opposite the hurdle, see Figure  2), facing the hurdle. The observation started as soon as the chick was set down and continued for 120 sec or until the chick succeeded in crossing the hurdle. Chicks were observed live by one of two observers, and behaviour coded with a behaviour coding program on digital tablets (Pocket Observer 3.2 by 2012 Noldus Information Technology on Samsung Galaxy Tab4 SM-T230NU). The observation period started when the chick was placed (both feet touched the floor) and behaviours measured were vocalisation bouts, latency to start walking, the number of jumping attempts made, crossing success, and strategy (ramp vs jumping). Inter-observer reliability was assessed by the two observers coding a total of 13 tests simultaneously. The number of birds tested by housing system and the success for each difficulty are detailed in Supplementary Table 2.

Vertical navigation test
The vertical navigation test was conducted in two test pens while the focal birds were homed in floor pens for the duration of this test. Every pullet was individually marked with numbered leg ring and numbered wing tag. All birds from a given housing treatment were randomly assigned to a pen with the experimenter blinded to the treatment. All pens were located in the same room and blinding required that the strains were kept on opposite sides of the room, to allow switching location of treatment groups with every flock. At the end of the room, there were two testing pens with three platforms each (at 60 cm, 120 cm, 180 cm, Figure 3). All platforms were made of black rubber grid fixed on metal frames. Habituation. During the 1 st week in the floor pens (week 15 of rearing), pullets were first introduced to the food reward, reward dish (red), and habituated to the test pens ( Figure 3). Habituation week went as follows: Day one, feeders were removed from the home pens for 30-60 minutes (min) before each pen received a red dish with high value foods (sweet corn and live mealworms) mixed into their normal feed. On days two and three, feeders were removed from the pens for 30-60 min before pullets were placed into the test pens, in groups of five, for 20 min with two reward dishes on the floor, holding reward foods. Before returning feeders into the home pens, all pens received one more portion of the food reward in the reward dish on the floor of their home pens to further strengthen association. On days four and five, after removing the feeder for 30-60 min, pullets were placed individually into the test pen for 3 min and interaction with the reward dish was recorded (avoidance, inspection, eating). On these days, all pens received reward food in two reward dishes in their home pens twice, each time one dish was placed on the floor, the other on the elevated platform. This was done to introduce pullets to the option of feeding on the platform. At the end of day five, pullets that did not meet the inclusion criteria, i.e., eating out of the reward dish on either day four, five, or both, were excluded from the test. Excluded birds were either not interested in the food reward, in which case the reward could not be used as an incentive to complete the test, or they were too afraid to eat in the test pen, in which case they would most likely be too afraid to complete the test. Due to time constraints a maximum number of 15 pullets per pen could be included in the test, if more than 15 pullets met inclusion criteria, surplus pullets were excluded randomly. Excluded birds remained in the home pens to keep stocking density constant. Consequently, 459 pullets qualified for testing (Supplementary Table 4). Testing. The following week (week 16 of rearing) included five days of testing in a modified version of a spatial test performed by Gunnarsson et al. 5 . Individual testing took place over five days with each pullet being asked to complete a different task every day. The tasks increased in difficulty every day, so that any frustration experienced from failure would not affect performance in an easier task. To successfully complete a task, pullets had to reach the platform with the food reward, which included a handful of sweet corn and 3-5 mealworms. In tasks 1-4, food rewards were placed on the elevated platforms and pullets were placed on the floor, facing away from the reward to minimise the risk of a pullet approaching the reward by accident while fleeing from the experimenter. In task 1, the reward was at 60 cm, in tasks 2 and 3 at 120 cm and in task 4 at 180 cm. For task 5, pullets were placed on the highest platform, facing the wall to minimise the risk of a pullet jumping on the experimenter before they stepped away, with the reward being on the middle platform. Tasks 2 and 4 could be completed with multiple steps, and the lower platforms held reward dishes with only two corn kernels and one mealworm. These lower dishes were meant to draw the pullet's attention upwards, as no assumption about navigation skills could be made if birds were unaware of the reward. Tests were observed live by two observers, one per test pen and success, defined as both feet on the platform, was recorded on paper. Additionally, the quality of each jump attempt was scored on a scale of 1-4 (1= worst, 4=best) where 1 and 2 describe unsuccessful transitions and 3 and 4 successful ones. If no attempt was made within 5 min, the pullet was returned to her home pen without receiving a score. The number of tested birds by housing system and their success per task are detailed in Supplementary Table 4.

Ramp-choice test
Birds from the vertical navigation test in flocks 1 and 2 where then tested in a ramp-choice test (Figure 4). Two pullets were excluded due to injuries and two more pullets were excluded after losing their identifying leg rings. Testing took place over three days with one task per pullet each day where they were given two minutes to complete the task and transition between the ground and the platform either by aerial locomotion, by using the ramp, or by a combination of the two ('mixed' strategy). For task '60U', the red reward dish was placed on the platform at 60cm elevation, and the pullet was placed on the ground facing away from the platform. In task '60D', the reward dish was placed on the ground and the pullet on the platform at 60cm elevation facing the wall. For task '120D', the reward dish was on the ground and the pullet was placed on the platform at 120cm elevation facing the wall. Tasks 60U and 60D were evenly balanced across days one and two while task 120D took place on day three for all pullets to avoid a drop in motivation should that talk be too difficult to complete. Testing was balanced across time of day with half the pullets from a treatment group being tested in the morning and the other in the afternoon. The tests were video recorded (Sony Handycam HDR-X240 and HDR-X110, Tokyo, Japan) and videos were observed by one observer using Noldus Observer XT version 14 (Noldus Information Technology, Wageningen, The Netherlands) in a randomized order generated by an online random number generator (https://www.random.org). Out of the 660 observations, 25 were lost due to a corrupted SD card. The outcome variables recorded were success, latency to succeed, and locomotion strategy of each attempted transition (aerial, ramp, or mixed). Intraobserver reliability was assessed by the observer re-coding 20 randomly chosen videos. The experimenters and the observer were blinded to housing treatment but not to strain, task, or hypothesis.

Data processing
Hurdle test. Six dependent variables were used to describe chick performance in the hurdle test: success, strategy, latency to walk, vocalisation, and persistence. Success (binary, yes=1, no=0) in the hurdle test was assessed for all tasks combined. Motivation for social reinstatement was assessed by analysing the latency to approach the hurdle, vocalisation frequency (bouts per minute) for all difficulties (1470 observations), and the persistence to cross (number of crossing attempts made) for difficulties 1, 3, and 5 (884 observations) to minimize the risk of overestimating crossing attempts, as chicks occasionally appeared to 'accidentally' walk onto the ramp. For difficulties 2 and 4, when chicks crossed the hurdle, strategy was analysed as a binary variable with crossing by jumping indicated as 1 and crossing by ramp as 0 (47 observations). Inter-observer reliability was moderate for frequencies (kappa= 0.63) and excellent for durations (kappa= 0.97).
Vertical navigation test. Three dependent variables were used to describe pullet performance in the vertical navigation test: success, jumping quality, and strategy. 'Success' of reaching the final reward was recorded as a binary value per task in all pullets (yes=1 or no=0). For example, in task 2 with two possible jumps (floor to 60 cm and 60 cm/floor to 120cm), the final reward was on the higher platform at 120cm. A pullet that jumped only to the lower platform would have been scored on the quality of that jump but received a zero for success. An independent variable called 'direction' described locomotion required to complete a task as either upwards (tasks 1-4) or downwards (task 5). To analyse the quality of vertical navigation performed, 'jumping quality' scores were averaged over all five tasks. Pullets were only assessed on the jumps attempted, therefore a bird that made no jump attempt would be excluded from this data set. A total of 374 pullets attempted at least one jump.
Ramp-choice test. Three dependent variables were used to assess the performance in the rampchoice tasks: success, latency to succeed, and strategy. 'Success' was defined as a pullet placing both feet on the vertical target level (platform in 60U, ground in 60D and 120D) and recorded as a binary variable (yes=1 or no= 0, 635 observations). 'Latency to succeed' was defined as the time elapsed between the placement of the pullet and success in seconds, only including observations with successful transitions (399 observations). Strategy was recorded for all attempted transitions (453 observations) as a categorical variable with three levels: 'aerial', 'mixed', or 'ramp'. Intra-observer reliability was excellent (kappa= 0.87).

Statistical analyses
Statistical analyses were done in R and R Studio version 3.5.2 using packages 'lme4', 'car', 'oddsratio, 'emmeans', 'effects', 'gmodels', and 'DHARMa'. Dependent variables in the hurdle test were analysed by fitting generalized-/ linear mixed effect models (G/LMM) with brooding compartment nested in housing style nested in flock as random effect. Successful crossing of the hurdle ("success"), latency to walk and vocalisation frequency were analysed based on the whole data set. To analyse success, a GLMM was fitted with housing, strain, difficulty, and all interactions as fixed effects. Latency to walk and vocalisation frequency were analysed by applying an LMM with housing, strain, and their interaction as fixed effects. For the ramp tests, strategy was analysed for those that crossed (ramp use vs jumping) by applying a GLMM with a binary outcome variable. For the hurdle only tests, persistence was analysed as the total amount of crossing attempts where success was used as one of the independent variables together with housing, strain, and all interactions in a GLMM. Inter-observer reliability was assessed by calculating kappa statistics on Observer (The Observer ® XT by Noldus, Version: 14.2) for durations and frequencies.
Jumping quality in the vertical navigation test (374 data points) was analysed by fitting a LMM and the probability of success was analysed with a GLMM. Both models had strain, housing, and their interaction as fixed effects. Direction was included as a fixed effect in the GLMM.
Success in the ramp-choice test (binary, 635 data points) was analysed by fitting a GLMM and latency to succeed (in seconds, 399 data points) was log transformed before an LMM. Both models had housing, strain, and their interaction as fixed effects.
A combination of non-parametric statistics and descriptive statistics were used to analyse locomotion strategy in the ramp-choice test (453 data points).